106 research outputs found

    An efficient hardware architecture for H.264 adaptive deblocking filter algorithm

    Get PDF
    This paper presents an efficient hardware architecture for real-time implementation of adaptive deblocking filter algorithm used in H.264 video coding standard. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. We use a novel edge filter ordering in a Macroblock to prevent the deblocking filter hardware from unnecessarily waiting for the pixels that will be filtered become available. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 72 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can code 30 CIF frames (352x288) per second

    High performance hardware architecture for half-pixel accurate H.264 motion estimation

    Get PDF
    In this paper, we present a high performance and low cost hardware architecture for real-time implementation of half-pel accurate variable block size motion estimation for H.264 / MPEG4 Part 10 video coding. The proposed architecture includes a novel half-pel interpolation hardware that is shared by novel half-pel search hardwares designed for each block size. This half-pel accurate motion estimation hardware is designed to be used as part of a complete H.264 video coding system for portable applications. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 85 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can process 30 HDTV frames (1280x720) per second

    A High performance and low power hardware architecture for H.264 cavlc algorithm

    Get PDF
    In this paper, we present a high performance and low power hard-ware architecture for real-time implementation of Context Adap-tive Variable Length Coding (CAVLC) algorithm used in H.264 / MPEG4 Part 10 video coding standard. This hardware is designed to be used as part of a complete low power H.264 video coding system for portable applications. The proposed architecture is im-plemented in Verilog HDL. The Verilog RTL code is verified to work at 76 MHz in a Xilinx Virtex II FPGA and it is verified to work at 233 MHz in a 0.18´ ASIC implementation. The FPGA and ASIC implementations can code 22 and 67 VGA frames (640x480) per second respectively

    A reconfigurable frame interpolation hardware architecture for high definition video

    Get PDF
    Since Frame Rate Up-Conversion (FRC) is started to be used in recent consumer electronics products like High Definition TV, real-time and low cost implementation of FRC algorithms has become very important. Therefore, in this paper, we propose a low cost hardware architecture for realtime implementation of frame interpolation algorithms. The proposed hardware architecture is reconfigurable and it allows adaptive selection of frame interpolation algorithms for each Macroblock. The proposed hardware architecture is implemented in VHDL and mapped to a low cost Xilinx XC3SD1800A-4 FPGA device. The implementation results show that the proposed hardware can run at 101 MHz on this FPGA and consumes 32 BRAMs and 15384 slices

    An efficient hardware architecture for H.264 intra prediction algorithm

    Get PDF
    In this paper, we present an efficient hardware architecture for real-time implementation of intra prediction algorithm used in H.264 / MPEG4 Part 10 video coding standard. The hardware design is based on a novel organization of the intra prediction equations. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 90 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can process 27 VGA frames (640x480) per second

    A High performance and low cost hardware arcitecture for H.264 transform and quantization algorithms

    Get PDF
    In this paper, we present a high performance and low cost hardware architecture for real-time implementation of forward transform and quantization and inverse transform and quantization algorithms used in H.264 / MPEG4 Part 10 video coding standard. The hard-ware architecture is based on a reconfigurable datapath with only one multiplier. This hardware is designed to be used as part of a complete low power H.264 video coding system for portable appli-cations. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 81 MHz in a Xilinx Virtex II FPGA and it is verified to work at 210 MHz in a 0.18´ ASIC implementation. The FPGA and ASIC implementations can code 27 and 70 VGA frames (640x480) per second respectively

    A high performance hardware architecture for one bit transform based motion estimation

    Get PDF
    Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (IBT) based ME algorithms have low computational complexity. Therefore, in this paper, we propose a high performance systolic hardware architecture for IBT based ME. The proposed hardware performs full search ME for 4 Macroblocks in parallel and it is the fastest IBT based ME hardware reported in the literature. In addition, it uses less on-chip memory than the previous IBT based ME hardware by using a novel data reuse scheme and memory organization. The proposed hardware is implemented in Verilog HDL. It consumes %34 of the slices in a Xilinx XC2VP30-7 FPGA. It works at 115 MHz in the same FPGA and is capable of processing 50 1920x1080 full High Definition frames per second. Therefore, it can be used in consumer electronics products that require real-time video processing or compression

    A High permormance hardware architecture for an sad reuse based hierarchical motion estimation algorithm for H.264 video coding

    Get PDF
    In this paper, we present a high performance and low cost hardware architecture for real-time implementation of an SAD reuse based hierarchical motion estimation algorithm for H.264 / MPEG4 Part 10 video coding. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 68 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can process 27 VGA frames (640x480) or 82 CIF frames (352x288) per second

    An efficient FPGA implementation of versatile video coding intra prediction

    Get PDF
    Versatile Video Coding (VVC) is a new international video compression standard offering much better compression efficiency than previous video compression standards at the expense of much higher computational complexity. In this paper, an efficient FPGA implementation of VVC intra prediction for angular prediction modes of 4x4, 8x8, 16x16 and 32x32 prediction unit sizes is proposed. In the proposed FPGA implementation, four constant multiplications used in one intra angular prediction equation are implemented using two DSP blocks and two adders in FPGA. The proposed FPGA implementation of VVC intra prediction, in the worst case, can process 34 full HD (1920x1080) frames per second

    Efficient hardware implementations of low bit depth motion estimation algorithms

    Get PDF
    In this paper, we present efficient hardware implementation of multiplication free one-bit transform (MF1BT) based and constraint one-bit transform (C-1BT) based motion estimation (ME) algorithms, in order to provide low bit-depth representation based full search block ME hardware for real-time video encoding. We used a source pixel based linear array (SPBLA) hardware architecture for low bit depth ME for the first time in the literature. The proposed SPBLA based implementation results in a genuine data flow scheme which significantly reduces the number of data reads from the current block memory, which in turn reduces the power consumption by at least 50% compared to conventional 1BT based ME hardware architecture presented in the literature. Because of the binary nature of low bit-depth ME algorithms, their hardware architectures are more efficient than existing 8 bits/pixel representation based ME architectures
    corecore